Words, Numbers and All That: the Lexicon in Sentence Understanding
نویسنده
چکیده
1 Cross-disciplinary Issues in Lexical Theories It is hardly a controversial statement that the acquisition and processing of language require knowledge of its words. Yet, the type and use of information encoded in a lexical entry, the relation of words to each other in the lexicon, and the relationship of the lexicon to the grammar, are complex and unsettled issues, on which researchers hold very diierent views. While there is a recent consensus that mechanisms operating in the lexicon are not substantially diierent from those operating in the syntax, there are diierences on whether the syntax is in the lexicon, or the lexicon is in the syntax. On the one view, the lexicon is a static repository of very rich representations, which regulate the composition of words to an extent that goes beyond the phrase, while on the other view the lexicon is dynamically generated as a result of composition and competition mechanisms, largely syntactic in nature. Roughly speaking, computational linguistics and psycholinguistics in general follow the rst view, and the two elds are converging on some similar lexicalised, probabilistic models of grammars. Theoretical linguistics has recently proposed models of the latter type. In computational linguistics, work on the lexicon has stemmed from two diierent areas of research: parsing and grammar formalisms, and construction of electronic databases (lexicography). In the area of parsing, the interest in probabilistic models and lexicalised grammars did not develop simultaneously. Parsers based on probabilistic context-free grammars were motivated by the diiculties in building robust, large-scale systems using the explicit representation of linguistic knowledge. Large corpus annotation eeorts and the creation of tree-banks (text corpora annotated with syntactic structures) enabled researchers to develop and automatically train probabilistic models of syntactic disambiguation (Mar-cus, Santorini, and Marcinkiewicz 1993). In an attempt to take advantage of the insights gained in the area of statistical speech processing, computational linguists initially adopted very simpliied statistical models of grammar and parsing, abandoning the more sophisticated lexicalised feature-based formalisms However, it soon became apparent that the success of probabilistic context-free grammars was limited by the strong (and incorrect) assumption of probabilistic independence of
منابع مشابه
First Language Activation during Second Language Lexical Processing in a Sentential Context
Lexicalization-patterns, the way words are mapped onto concepts, differ from one language to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...
متن کاملOur system for annotation of articles is named “Text Detective”
Text Detective is then able to tag every word in the sentence according to biological relevant categories. For instance, chemical compounds are recognized and labelled. The identification of “central words” (also known as “core terms”) is a key step in this process (words such as “receptor”, “kinase”, “transporter”, etc). For this purpose, we have built a lexicon and used some carefully curated...
متن کاملIranian EFL Learners’ Lexical Inferencing Strategies at Both Text and Sentence levels
Lexical inferencing is one of the most important strategies in vocabulary learning and it plays an important role in dealing with unknown words in a text. In this regard, the aim of this study was to determine the lexical inferencing strategies used by Iranian EFL learners when they encounter unknown words at both text and sentence levels. To this end, forty lower intermediate students were div...
متن کاملNon-Literal Word Sense Identification Through Semantic Network Path Schemata
When computer programs disambiguate words in a sentence, they often encounter non-literal or novel usages not included in their lexicon. In a recent study, Georgia Green (personal communication) estimated that 17% to 20% of the content word senses encountered in various types of normal English text are not fisted in the dictionary. While these novel word senses are generally valid, they occur i...
متن کاملDesign and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملLogical Structures in the Lexicon
The lexical entry for a word must contain all the information needed to construct a semantic representation for sentences that contain the word. Because of that requirement, the formats for lexical representations must be as detailed as the semantic forms. Simple representations, such as features and frames, are adequate for resolving many syntactic ambiguities. But since those notations cannot...
متن کامل